Reference Excitation Energies of Increasingly Large Molecules: A QMC Study of Cyanine Dyes

We revisit here the lowest vertical excitations of cyanine dyes using quantum Monte Carlo and leverage recent developments to systematically improve on previous results. In particular, we employ a protocol for the construction of compact and accurate multideterminant Jastrow-Slater wave functions for multiple states, which we have recently validated on the excited-state properties of several small prototypical molecules. Here, we obtain quantum Monte Carlo excitation energies in excellent agreement with high-level coupled cluster for all the cyanines where the coupled cluster method is applicable. Furthermore, we push our protocol to longer chains, demonstrating that quantum Monte Carlo is a viable methodology to establish reference data at system sizes which are hard to reach with other high-end approaches of similar accuracy. Finally, we determine which ingredients are key to an accurate treatment of these challenging systems and rationalize why a description of the excitation based on only active π orbitals lacks the desired accuracy for the shorter chains.


INTRODUCTION
Cyanine dyes are a family of charged π-conjugated molecules which are employed in very diverse applications ranging from dye-synthesized solar cells to the labeling of biomolecules. 1−3 Their characteristic structure consists of a chain of an odd number of carbons with two amine groups at the ends. While their photophysical properties are strongly regulated by the length of the carbon chain, the lowest bright state of the cyanines always maintains a π → π* character and can be predominantly described as a HOMO to LUMO (HL) transition. Despite the apparent simplicity of this excitation, its accurate treatment is known to be challenging, and consequently, cyanine dyes have often been used as model systems to assess the quality of electronic structure methods for excited states. 4−12 Here, we employ quantum Monte Carlo (QMC) to revisit the vertical excitation energies of cyanine dyes of the simple form C n H n (NH 2 ) 2 + with n an odd number ranging from 1 to 17, combining the use of sophisticated multideterminant wave functions with recent developments for their efficient optimization in variational Monte Carlo (VMC). 13 −16 In particular, we build on our successful treatment at chemical accuracy of the excitation energies and optimal excited-state structures of small, prototypical molecules, 17−19 where the determinantal components of the multiple states are generated in an automatic and balanced manner with the configuration interaction using a perturbative selection made iteratively (CIPSI) approach. 20 Studying the bright excitation of cyanine dyes enables us to demonstrate the accuracy of our protocol for the shorter chains, where high-level coupled cluster (CC) offers a good compromise in terms of accuracy versus computational cost. Importantly, it also establishes the applicability of QMC to larger sizes where the use of other high-level approaches is more challenging. Finally, we identify the key descriptors of orbital correlations for these systems and elucidate why earlier QMC studies with limited active space wave functions lacked the expected accuracy. 4

METHODS
We employ QMC wave functions of the so-called Jastrow-Slater form, namely (1) where is the Jastrow correlation factor, and D i are determinants of single-particle orbitals. The Jastrow factor explicitly depends on the interparticle coordinates and includes here electron−electron and electron−nucleus correlation terms. 21 To generate the determinantal components for the two states, we employ the CIPSI approach which, starting from a given reference space, builds expansions by iteratively selecting determinants based on their second-order perturbation (PT2) energy contribution obtained via the Epstein-Nesbet partitioning of the Hamiltonian 22,23 where Ψ CIPSI is the current CIPSI wave function for the state under consideration, and |α⟩ denotes a determinant outside the current CI space. Since the ground and excited states of the cyanines have different symmetries, a state-specific approach can be used to perform the selection for the two states separately, using different orbitals.
We are here interested in computing excitation energies and, therefore, wish to achieve a balanced CIPSI description of the states of interest, which leads to converged excitation energies in QMC already for relatively small expansions. A measure of the quality of a given CIPSI wave function is its PT2 energy contribution, which represents an approximate estimate of the error of the expansion with respect to the full CI (FCI) limit. Therefore, we can compute the excitation energies using expansions for the two (or more generally multiple) states with matched PT2 energy and, therefore, ensure comparable quality. We refer the reader to ref 19 on how to impose the "iso-PT2" criterion when treating multiple states of the same symmetry expanded on a common set of determinants.
Alternatively, one can match the CI variance of the relevant states, which is defined as the variance of the FCI Hamiltonian: As the CIPSI wave function approaches the FCI limit, the CI variance goes to zero. For various small molecules, 17−19 we have found that matching the PT2 energy contributions leads to expansions with also very similar variances. In general, this is not always the case, and one of the two criteria might be more suitable than the other for the computation of the CI excitation energies of a particular system. While we discuss in detail below the impact of this choice on the QMC excitation energies, we stress already here that the convergence of the QMC results is established not based on their agreement with available reference data but in an "internally consistent" manner based on the similarity of the VMC and DMC excitation energies 19 and their convergence with respect to the number of determinants.
Finally, as an alternative to the CIPSI expansions, we test complete active space (CAS) expansions for the determinantal components of our QMC wave functions. We start from separate CASSCF calculations for the two states and consider minimal active spaces by correlating the π electrons in the π orbitals constructed from the 2p z orbitals. For the smaller cyanines with up to 7 heavy atoms, CN3−CN7 (we label a cyanine as CNm with m the total number of C and N atoms), we also explore the use of a larger active space with molecular orbitals constructed from the 2p z and 3p z atomic orbitals. Finally, in some cases, we also test the performance of a simple one-configuration ansatz, namely, the Hartree−Fock (HF) and HOMO−LUMO (HL) configurations for the ground and excited states, respectively.

COMPUTATIONAL DETAILS
Unless otherwise specified, we employ scalar-relativistic energy-consistent HF pseudopotentials and the correlationconsistent Gaussian basis sets specifically constructed for these pseudopotentials. 24,25 For most of the calculations, we use a double-ζ basis set minimally augmented with s and p diffuse functions on the heavy atoms and denoted here as maug-cc-pVDZ. Convergence tests are performed with the fully augmented aug-cc-pVTZ basis set. The exponents of the diffuse functions are taken from the corresponding all-electron Dunning's correlation-consistent basis sets. 26 The HF and CASSCF computations are performed with the program GAMESS(US) . 27,28 When using the CASSCF wave functions in QMC, we truncate the CAS expansion for CN11 and CN13, using a threshold on the CSF coefficients so that the configurations make up respectively about 0.9985 and 0.9765 of the weight of the total wave functions of the two states. The CIPSI expansions are generated with Quantum Package 29 and constructed to be eigenstates of Ŝ2. 17 We perform the selection for the two states separately, starting from CASSCF orbitals obtained with the larger active spaces for the cyanine molecules up to CN7 and the minimal CAS from CN9 to CN15. We use the HF orbitals for CN17 and CN19. As shown in Figure S1 and Table S5 for CN3 and Figure S6 for CN15, the use of different orbitals to generate the CIPSI expansions has no appreciable impact on the CI or QMC excitation energies.
The QMC calculations are carried out with the CHAMP code. 30 The determinantal part of our QMC wave functions is expressed in terms of spin-adapted configuration state functions (CSF) to reduce the number of parameters during the VMC optimization. In the wave function optimization, we sample a guiding wave function that differs from the current wave function close to the nodes 31 to guarantee finite variances of the estimators of the gradients with respect to the wave function parameters. All wave function parameters (Jastrow, CI, and orbital coefficients) are optimized in state-specific energy minimization following the stochastic reconfiguration scheme. 14,32 In the DMC calculations, we treat the pseudopotentials beyond the locality approximation using the T-move algorithm 33 and employ an imaginary time-step of 0.05 au which we have already tested for one of the cyanine chains and shown to yield excitation energies converged to better than 0.01 eV. 18 We compute all energies on the ground-state geometries of CN3−CN11 determined with all-electron PBE0/cc-pVQZ in ref 8 and obtain the geometries for CN13 to CN19 at the same level of theory with the Gaussian 09 program. 34 We employ the programs CFour v2.1 35 and Molcas 36 for the approximate coupled cluster singles and doubles (CC2) and singles, doubles, and triples models (CC3) and the CASPT2 calculations, respectively, using the all-electron aug-cc-pVDZ basis set and the frozen-core approximation, unless otherwise specified.

RESULTS
We compute the lowest π → π* vertical excitation energy of cyanine dyes of the form C n H n (NH 2 ) 2 + with n ranging from 1 up to 17. The structures of the CN3 and CN9 molecules are shown in Figure 1. In all cases, the point group of the molecule is C 2v with the ground (GS) and excited (ES) states having A 1 and B 1 symmetry, respectively.
For CN3 up to CN15, we compare the QMC excitation energies with the all-electron CC3/aug-cc-pVDZ results. The use of the CC3 method as reference for the bright excitation of these systems is supported by the agreement of the CC3 excitation energies with the corresponding extrapolated FCI (exFCI) estimates in a small basis of the smaller CN3 and CN5 to better than 0.05 eV. 12 Employing the aug-cc-pVDZ basis set is sufficient given the agreement with the corresponding aug-cc-pVTZ values (see Table S1). Importantly, the all-electron CC3/aug-cc-pVDZ excitation energies are very close to the BFD CC3/aug-cc-pVDZ values, confirming that the use of pseudopotentials does not introduce appreciable errors. The reference CC3/aug-cc-pVDZ values also agree with the corresponding CC3 excitation energies computed with the BFD maug-cc-pVDZ basis set for all cyanines except the smallest CN3 (see Table S1), where a fully augmented double-ζ basis is needed also in the BFD calculations.
For dyes larger than CN15, we are however not able to run the CC3 calculations due to memory requirements, 37 and the DMC excitation energy with our best CIPSI wave function becomes then the reference for other calculations.
4.1. Building the Expansions. To compute accurate QMC excitation energies for the cyanine dyes, one needs balanced Jastrow-Slater wave functions to describe the ground and excited states. This is achieved in two stages, where the first is the construction of CIPSI expansions with the iso-PT2 and/or iso-variance scheme, and the second is a validation criterion that the resulting excitation energies in VMC and DMC are close to each other and converged with respect to the number of determinants.
In particular, we generate the ground-and excited-state expansions at the CIPSI level to have either matched PT2 energy corrections or CI variances, which we use as measures of the "distance" of the wave functions from the FCI limit. Imposing that the determinantal components satisfy either the iso-PT2 or iso-variance criterion was previously found to lead to QMC excitation energies which were converged to the best reference values with a handful of determinants, 17,18 even when the error on the starting CI excitation energy was relatively large. 19 In Figure 1, we illustrate the convergence of the CI excitation energies of CN3 and CN9 versus the total number of determinants for expansions characterized by similar PT2 corrections or CI variances. For CN3, the iso-PT2 construction leads to a somewhat faster convergence of the excitation energy for small expansions, but the two criteria become quickly equivalent beyond a few 1000 determinants. The situation is reversed for CN9, where matching the PT2 correction yields a much slower converging CI excitation energy, while the iso-variance criterion leads to a good agreement with the CC3 value in the same basis set for little more than 1000 determinants. In fact, we find that variancematched expansions yield a faster converging CI excitation energy starting from CN7 and that, surprisingly, fewer determinants are needed to obtain a good estimate for the larger system sizes considered (see Figure S4). Consequently, Figure 1. CI vertical excitation energies of CN3 (top) and CN9 (bottom) versus the total number of determinants, computed for ground-and excited-state CIPSI expansions having either matched PT2 energy contributions or CI variances. The VMC and DMC excitation energies obtained using the iso-variance expansions are also shown (the statistical error is smaller than the symbol size). The BFD pseudopotentials and the maug-cc-pVDZ basis are used here also for the CC3 calculations. the CI treatment of the smallest cyanine, CN3, appears to be the most difficult as further elaborated in section 4.3.
Importantly, in Figure 1, we also show that QMC largely corrects for possible shortcomings of the starting CIPSI expansions, yielding excitation energies which display a rather small dependence on the number of determinants, especially at the DMC level. For CN3 and small expansions, where the isovariance criterion significantly overestimates the CI excitation energy, VMC and DMC reduce the error at the CI level by about 0.2 and 0.3 eV, respectively. As the expansions become larger, the difference between the VMC and the DMC values diminishes, falling well below chemical accuracy (about 0.05 eV) for both CN3 and CN9. The robustness of the QMC results is further corroborated for CN7 in Table S6, where we show that, for a comparable number of determinants, the use of PT2-and variance-matched wave functions yields excitation energies which differ by about 0.2 eV at the CI level but are very close in VMC and completely equivalent in DMC.

Best QMC Vertical Excitations.
In Table 1, we summarize the VMC and DMC excitation energies of all cyanine dyes obtained with the largest CIPSI expansions of Table S6 and the iso-variance selection criterion. We also list the QMC and CASPT2 excitation energies computed with minimal CAS expansions, together with our CC2 and CC3 results and the exFCI estimates from the literature. 12 We refer the reader to Table S6 for additional QMC calculations with different numbers of determinants in the Jastrow-CIPSI wave functions.
For the reported CIPSI expansions, the VMC and DMC excitation energies are very close and also agree within chemical accuracy with the CC3 and exFCI values in all cases where these methods are applicable. This is in line with our previous findings that the agreement between VMC and DMC excitation energies is a strong indication of the balanced quality of the corresponding wave functions. 19 Furthermore, we find that the QMC values for the larger dyes are in very good agreement with the estimates given by the extrapolation of the CC3 results as a function of the number of electrons (see the SI). Since DMC can be employed in all cases, we plot all excitation energies in Figure 2 in terms of their distance to the DMC-CIPSI results, which we use as reference values.
The CASPT2 and QMC-CAS energies computed with the minimal active spaces are instead very different from the DMC-CIPSI results: CASPT2 always underestimates the excitation energies, whereas QMC-CAS tends to overestimate them, similarly to what was reported for CAS wave functions in ref 4. For CN3−CN7, we test the effect of including more π orbitals in the active space, which somewhat ameliorates the VMC excitation energies but does not sufficiently affect the DMC values, which remain far from the DMC-CIPSI reference (see Table S7). Interestingly, we note that both the CASPT2 and QMC-CAS methods approach the best DMC results as the size of the molecule increases, suggesting an easier treatment of the longer chains as already found at the CI level and as further discussed below.
4.3. Capturing Orbital Correlation. To understand the different performance of CAS and CIPSI expansions when used in QMC wave functions, we focus here on CN3 and analyze in Figure 3 the VMC and DMC vertical excitation energies calculated using different determinantal components in the trial wave functions. As already mentioned, despite being the smallest cyanine dye, CN3 appears to be the most challenging one: the use of Jastrow-CAS wave functions leads to quite big errors. and the number of CIPSI determinants needed to converge the excitation energy is larger than for the longer dyes.
The simplest QMC calculations are performed with a oneconfiguration (HF/HL) wave function and the maug-cc-pVDZ basis set. We then proceed to CAS determinantal components and CIPSI expansions also employing the aug-cc-pVTZ basis set. The VMC excitation energy computed with the minimal CAS wave functions is worse than the HF/HL value since the active space comprises more determinants for the ground state but only the HL configuration for the excited state. DMC ameliorates the result, but using a larger CAS space on the π orbitals only marginally helps (see Table S7). On the other hand, with the CIPSI selected determinants, we have a considerable improvement on the excitation energy, and with the use of just a few hundred determinants, the DMC error reduces to less than 0.1 eV. Employing larger expansions with the maug-cc-pVDZ basis set, we finally converge to VMC values which are consistent with the DMC ones and approximately 0.04 eV higher than the reference. The use of the aug-cc-pVTZ basis set further reduces the excitation energy by about 0.02 eV. We note that, for the longer cyanine chains, the smaller maug-cc-pVDZ basis set is found to be sufficient for the computation of this excitation energy. 18 The superior performance of the use of a CIPSI with respect to the CAS expansions in QMC indicates that some key  descriptor of correlation is missing from the active space and is not recovered through the addition of the Jastrow factor and the subsequent full optimization in VMC nor through a DMC calculation with the optimal Jastrow-Slater wave function. In this work as in ref 4, the active space is chosen to correlate the π electrons in the π orbitals. From the QMC-CIPSI results, we can therefore infer that, while the excitation of interest is predominantly of π → π* character, other orbital correlations are important and cannot be omitted in the QMC wave function of the shorter cyanines.
To better understand this, we present a CI study for CN3− CN7 with the small 6-31G basis set in Table 2. We correlate only the valence electrons and use state-average natural orbitals obtained with a preliminary calculation at the CIPSI level. For each state, we compute the energy with only one CSF, and on top of this configuration, we perform a CAS-CI calculation restricted to the σ and the π orbitals in a CAS-σ and CAS-π, respectively. The reference FCI excitation energy in this basis and the associated confidence interval are computed following the scheme presented in ref 38 rather than extrapolating the variational energies of the individual states in the limit of the PT2 energy correction going to zero. Indeed, the uncertainties of the extrapolated FCI energies of both states are larger than the uncertainty on the estimated excitation energy computed with this scheme. Since the CC3 estimate for CN3 and CN5 is in excellent agreement with the FCI value, we use the CC3 excitation energy as reference for CN7. We note that, because of the use of the simple 6-31G basis set, the FCI and CC3 excitation energies are much higher than the more accurate results presented above, but this is not relevant for the present discussion.
For the CN3 molecule, the excitation energy obtained with a single CSF for each state is 8.91 eV, namely, higher by 1.4 eV than the FCI result. The CAS-π calculation corrects only 60% of the error, indicating that the σ orbitals also play an important role in the stabilization of the excited state. Similarly, the excitation energy obtained with the CAS-σ improves the excitation energy with respect to the single CSF by recovering about 31% of the error. These results indicate the importance of both σ and π orbitals in the calculation of the excitation energy of CN3.
Therefore, to partially account for both π and σ correlations, we perform a multireference CI calculation, applying all possible single and double excitations to the CAS-π determinants. Such a CAS-π+SD-σ calculation also enables the relaxation of the CAS-π CI coefficients in the presence of most of the σ correlation. The resulting excitation energy is now significantly improved but still 0.1 eV higher than the FCI reference, confirming that a similar computational effort needs to be made for the π and σ orbitals. This justifies the use of CIPSI where the most important Slater determinants will be chosen to describe σ, π, and σ−π correlation in a "democratic" way based on their contribution to the second-order perturbation energy.
For CN5, the situation is somewhat different. While the single CSF still overestimates the excitation energy by 1.47 eV, the CAS-π wave function behaves better than for CN3, recovering 80% of the error. Consequently, omitting the σ orbitals in the active space results in an excitation energy closer to the reference than in the CN3 case. Once the σ orbitals are introduced, as for CN3, we improve the excitation energy but still observe an overestimation of the CAS-π+SD-σ result by almost 0.2 eV, pointing to the importance of describing the σ as well as the π correlation. The situation for CN7 is similar as for CN5, suggesting that the whole series behaves like CN5 and that CN3 is an exception because of the particularly small length of the chain.

CONCLUSION
We have presented a QMC benchmark study of the lowest vertical excitation energies of cyanine chains. We constructed the determinantal components of the Jastrow-Slater wave functions through an automatic selected-CI procedure and obtained a balanced description of the relevant states by ensuring similar quality of the corresponding expansions, for instance by matching their CI variances. With compact We also applied our protocol to longer cyanines and validated the accuracy of our estimates via the consistent closeness of the determined VMC and DMC excitation energies. Finally, we showed that key to a successful description of this excitation over all chain lengths is to account for π, σ, and σ−π correlations, therefore going beyond a CAS treatment based on π-orbitals only. In conclusion, we believe that the present study further establishes QMC methods as accurate and robust tools for the treatment of excited states of relatively large systems and parameter spaces.